Pesquisa | Portal Regional da BVS

Ingredient Prediction via Context Learning Network With Class-Adaptive Asymmetric Loss.

Luo, Mengjiang; Min, Weiqing; Wang, Zhiling; Song, Jiajun; Jiang, Shuqiang.

IEEE Trans Image Process ; 32: 5509-5523, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37773904

RESUMO

Ingredient prediction has received more and more attention with the help of image processing for its diverse real-world applications, such as nutrition intake management and cafeteria self-checkout system. Existing approaches mainly focus on multi-task food category-ingredient joint learning to improve final recognition by introducing task relevance, while seldom pay attention to making good use of inherent characteristics of ingredients independently. Actually, there are two issues for ingredient prediction. First, compared with fine-grained food recognition, ingredient prediction needs to extract more comprehensive features of the same ingredient and more detailed features of various ingredients from different regions of the food image. Because it can help understand various food compositions and distinguish the differences within ingredient features. Second, the ingredient distributions are extremely unbalanced. Existing loss functions can not simultaneously solve the imbalance between positive-negative samples belonging to each ingredient and significant differences among all classes. To solve these problems, we propose a novel framework named Class-Adaptive Context Learning Network (CACLNet) for ingredient prediction. In order to extract more comprehensive and detailed features, we introduce Ingredient Context Learning (ICL) to reduce the negative impact of complex background in food images and construct internal spatial connections among ingredient regions of food objects in a self-supervised manner, which can strengthen the contacts of the same ingredients through region interactions. In order to solve the imbalance of different classes among ingredients, we propose one novel Class-Adaptive Asymmetric Loss (CAAL) to focus on various ingredient classes adaptively. Besides, considering that the over-suppression of negative samples will over-fit positive samples of those rare ingredients, CAAL alleviates this continuous suppression according to the imbalanced ratios based on gradients while maintaining the contribution of positive samples by lesser suppression. Extensive evaluation on two popular benchmark datasets (Vireo Food-172, UEC Food-100) demonstrates our proposed method achieves the state-of-the-art performance. Further qualitative analysis and visualization show the effectiveness of our method. Code and models are available at https://123.57.42.89/codes/CACLNet/index.html.

Vision-based food nutrition estimation via RGB-D fusion network.

Shao, Wenjing; Min, Weiqing; Hou, Sujuan; Luo, Mengjiang; Li, Tianhao; Zheng, Yuanjie; Jiang, Shuqiang.

Food Chem ; 424: 136309, 2023 Oct 30.

Artigo em Inglês | MEDLINE | ID: mdl-37207601

RESUMO

With the development of deep learning technology, vision-based food nutrition estimation is gradually entering the public view for its advantage in accuracy and efficiency. In this paper, we designed one RGB-D fusion network, which integrated multimodal feature fusion (MMFF) and multi-scale fusion for visioin-based nutrition assessment. MMFF performed effective feature fusion by a balanced feature pyramid and convolutional block attention module. Multi-scale fusion fused different resolution features through feature pyramid network. Both enhanced feature representation to improve the performance of the model. Compared with state-of-the-art methods, the mean value of the percentage mean absolute error (PMAE) for our method reached 18.5%. The PMAE of calories and mass reached 15.0% and 10.8% via the RGB-D fusion network, improved by 3.8% and 8.1%, respectively. Furthermore, this study visualized the estimation results of four nutrients and verified the validity of the method. This research contributed to the development of automated food nutrient analysis (Code and models can be found at http://123.57.42.89/codes/RGB-DNet/nutrition.html).

Assuntos

Aprendizado Profundo , Análise de Alimentos , Nutrientes , Valor Nutritivo

Large Scale Visual Food Recognition.

Min, Weiqing; Wang, Zhiling; Liu, Yuxin; Luo, Mengjiang; Kang, Liping; Wei, Xiaoming; Wei, Xiaolin; Jiang, Shuqiang.

IEEE Trans Pattern Anal Mach Intell ; 45(8): 9932-9949, 2023 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-37021867

RESUMO

Food recognition plays an important role in food choice and intake, which is essential to the health and well-being of humans. It is thus of importance to the computer vision community, and can further support many food-oriented vision and multimodal tasks, e.g., food detection and segmentation, cross-modal recipe retrieval and generation. Unfortunately, we have witnessed remarkable advancements in generic visual recognition for released large-scale datasets, yet largely lags in the food domain. In this paper, we introduce Food2K, which is the largest food recognition dataset with 2,000 categories and over 1 million images. Compared with existing food recognition datasets, Food2K bypasses them in both categories and images by one order of magnitude, and thus establishes a new challenging benchmark to develop advanced models for food visual representation learning. Furthermore, we propose a deep progressive region enhancement network for food recognition, which mainly consists of two components, namely progressive local feature learning and region feature enhancement. The former adopts improved progressive training to learn diverse and complementary local features, while the latter utilizes self-attention to incorporate richer context with multiple scales into local features for further local feature enhancement. Extensive experiments on Food2K demonstrate the effectiveness of our proposed method. More importantly, we have verified better generalization ability of Food2K in various tasks, including food image recognition, food image retrieval, cross-modal recipe retrieval, food detection and segmentation. Food2K can be further explored to benefit more food-relevant tasks including emerging and more complex ones (e.g., nutritional understanding of food), and the trained models on Food2K can be expected as backbones to improve the performance of more food-relevant tasks. We also hope Food2K can serve as a large scale fine-grained visual recognition benchmark, and contributes to the development of large scale fine-grained visual analysis.

Assuntos

Algoritmos , Benchmarking , Humanos , Aprendizagem

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA